19 research outputs found
Sample-efficient Learning and Generalization with Text Representations
Humans have a remarkable ability to learn without much supervision. Often, a few labelled instances or a single demonstration is enough for us to learn a new concept. Most of our knowledge is acquired in a weakly unsupervised manner, via reading, perception, and active interaction with the world. Machine learning models, on the other hand, struggle to learn from limited supervision and often need large amounts of labelled data to learn. In many practical instances, however, such supervision is not available. Furthermore, collecting labeled instances for training may be expensive or infeasible due to privacy reasons. This calls for approaches that can adapt to new tasks or new domains without needing a lot of labelled data.
In this thesis, I address the limited supervision problem from two perspectives. First, I examine methods that exploit large amounts of unlabelled data to learn useful feature representations in a self-supervised manner. Such representations capture rich prior knowledge about the data, allowing them to be useful across many tasks, and enable data-efficient learning of new tasks. In particular, my work is concerned with the following key questions pertaining to text representations -
(i) How do we learn representations of larger units of text, beyond words?
(ii) How do we design training objectives that can efficiently learn such representations?
(iii) How do we come up with representations that allow efficient knowledge transfer to downstream language understanding tasks?
Second, I explore models and algorithms capable of learning from limited supervision. My work studies weakly supervised, few-shot and zero-shot learning settings with applications to text generation, sequence modeling, entity understanding and embodied control. My work demonstrates that text descriptions are an effective means of building models that generalize to new domains and new tasks without needing to experience supervised data for the new domain/task. I believe that the next generation of AI technologies will be driven by models that read and understand text to perform tasks.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169634/1/llajan_1.pd
Generative Adversarial Text to Image Synthesis
Automatic synthesis of realistic images from text would be interesting and
useful, but current AI systems are still far from this goal. However, in recent
years generic and powerful recurrent neural network architectures have been
developed to learn discriminative text feature representations. Meanwhile, deep
convolutional generative adversarial networks (GANs) have begun to generate
highly compelling images of specific categories, such as faces, album covers,
and room interiors. In this work, we develop a novel deep architecture and GAN
formulation to effectively bridge these advances in text and image model- ing,
translating visual concepts from characters to pixels. We demonstrate the
capability of our model to generate plausible images of birds and flowers from
detailed text descriptions.Comment: ICML 201
Discriminator-Guided Multi-step Reasoning with Language Models
In the context of multi-step reasoning, language models (LMs) probabilities
are often miscalibrated -- solutions with high probabilities are not always
correct. Therefore, greedy decoding, which is the standard decoding method for
reasoning tasks, often yields incorrect solutions. In addition, methods such as
self-consistency and verifiers rely on sampling from the LM distribution and do
not tackle the underlying issue. To address this, we introduce Guiding
Multi-step ReAsoning with a CorrectnEss Discriminator (GRACE), a stepwise
decoding approach that nudges the model towards producing correct reasoning
steps. GRACE employs a discriminator model, which is trained to differentiate
correct steps from invalid ones, to adjust decoding preferences based on the
correctness of each reasoning step. Importantly, GRACE does not require
fine-tuning or re-training the LMs. When compared with conventional decoding
strategies over four popular math reasoning benchmarks, GRACE exhibits
significant improvements in both final answer accuracy and step correctness,
outperforming both greedy decoding and self-consistency.\footnote{Our code can
be found at \url{https://github.com/mukhal/grace.}}Comment: 19 pages, 7 figures, and 8 table
Exploring Demonstration Ensembling for In-context Learning
In-context learning (ICL) operates by showing language models (LMs) examples
of input-output pairs for a given task, i.e., demonstrations. The standard
approach for ICL is to prompt the LM with concatenated demonstrations followed
by the test input. This approach suffers from some issues. First, concatenation
offers almost no control over the contribution of each demo to the model
prediction. This can be sub-optimal when some demonstrations are irrelevant to
the test example. Second, due to the input length limit of some transformer
models, it might be infeasible to fit many examples into the context,
especially when dealing with long-input tasks. In this work, we explore
Demonstration Ensembling (DENSE) as an alternative to simple concatenation.
DENSE predicts outputs using subsets (i.e., buckets) of the demonstrations and
then combines the output probabilities resulting from each subset to produce
the final prediction. We study different ensembling methods using GPT-j and
experiment on 12 language tasks. Our experiments show weighted max ensembling
to outperform vanilla concatenation by as large as 2.4 average points. Code
available at https://github.com/mukhal/icl-ensembling.Comment: Published at ME-FoMo workshop at ICLR 2023. Arxiv version includes
evaluation on 5 more task
MultiPrompter: Cooperative Prompt Optimization with Multi-Agent Reinforcement Learning
Recently, there has been an increasing interest in automated prompt
optimization based on reinforcement learning (RL). This approach offers
important advantages, such as generating interpretable prompts and being
compatible with black-box foundation models. However, the substantial prompt
space size poses challenges for RL-based methods, often leading to suboptimal
policy convergence. This paper introduces MultiPrompter, a new framework that
views prompt optimization as a cooperative game between prompters which take
turns composing a prompt together. Our cooperative prompt optimization
effectively reduces the problem size and helps prompters learn optimal prompts.
We test our method on the text-to-image task and show its ability to generate
higher-quality images than baselines
Merging Generated and Retrieved Knowledge for Open-Domain QA
Open-domain question answering (QA) systems are often built with retrieval
modules. However, retrieving passages from a given source is known to suffer
from insufficient knowledge coverage. Alternatively, prompting large language
models (LLMs) to generate contextual passages based on their parametric
knowledge has been shown to improve QA performance. Yet, LLMs tend to
"hallucinate" content that conflicts with the retrieved knowledge. Based on the
intuition that answers supported by both sources are more likely to be correct,
we propose COMBO, a Compatibility-Oriented knowledge Merging for Better
Open-domain QA framework, to effectively leverage the two sources of
information. Concretely, we match LLM-generated passages with retrieved
counterparts into compatible pairs, based on discriminators trained with silver
compatibility labels. Then a Fusion-in-Decoder-based reader model handles
passage pairs to arrive at the final answer. Experiments show that COMBO
outperforms competitive baselines on three out of four tested open-domain QA
benchmarks. Further analysis reveals that our proposed framework demonstrates
greater efficacy in scenarios with a higher degree of knowledge conflicts.Comment: EMNLP 2023 - Camera Read
Knowledge Unlearning for Mitigating Privacy Risks in Language Models
Pretrained Language Models (LMs) memorize a vast amount of knowledge during
initial pretraining, including information that may violate the privacy of
personal lives and identities. Previous work addressing privacy issues for
language models has mostly focused on data preprocessing and differential
privacy methods, both requiring re-training the underlying LM. We propose
knowledge unlearning as an alternative method to reduce privacy risks for LMs
post hoc. We show that simply applying the unlikelihood training objective to
target token sequences is effective at forgetting them with little to no
degradation of general language modeling performances; it sometimes even
substantially improves the underlying LM with just a few iterations. We also
find that sequential unlearning is better than trying to unlearn all the data
at once and that unlearning is highly dependent on which kind of data (domain)
is forgotten. By showing comparisons with a previous data preprocessing method
known to mitigate privacy risks for LMs, we show that unlearning can give a
stronger empirical privacy guarantee in scenarios where the data vulnerable to
extraction attacks are known a priori while being orders of magnitude more
computationally efficient. We release the code and dataset needed to replicate
our results at https://github.com/joeljang/knowledge-unlearning
TOD-Flow: Modeling the Structure of Task-Oriented Dialogues
Task-Oriented Dialogue (TOD) systems have become crucial components in
interactive artificial intelligence applications. While recent advances have
capitalized on pre-trained language models (PLMs), they exhibit limitations
regarding transparency and controllability. To address these challenges, we
propose a novel approach focusing on inferring the TOD-Flow graph from dialogue
data annotated with dialog acts, uncovering the underlying task structure in
the form of a graph. The inferred TOD-Flow graph can be easily integrated with
any dialogue model to improve its prediction performance, transparency, and
controllability. Our TOD-Flow graph learns what a model can, should, and should
not predict, effectively reducing the search space and providing a rationale
for the model's prediction. We show that the proposed TOD-Flow graph better
resembles human-annotated graphs compared to prior approaches. Furthermore,
when combined with several dialogue policies and end-to-end dialogue models, we
demonstrate that our approach significantly improves dialog act classification
and end-to-end response generation performance in the MultiWOZ and SGD
benchmarks. Code available at: https://github.com/srsohn/TOD-Flo